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VARIANCE BOUNDING MARKOV CHAINS 

By Gareth O. Roberts and Jeffrey S. Rosenthal 1 

Lancaster University and University of Toronto 

We introduce a new property of Markov chains, called variance 
bounding. We prove that, for reversible chains at least, variance bound- 
ing is weaker than, but closely related to, geometric ergodicity. Fur- 
thermore, variance bounding is equivalent to the existence of usual 
central limit theorems for all L 2 functionals. Also, variance bounding 
(unlike geometric ergodicity) is preserved under the Peskun order. 
We close with some applications to Metropolis-Hastings algorithms. 

1. Introduction. Markov chain Monte Carlo (MCMC) algorithms are 
widely used in statistics, physics, and computer science. Measures of how 
good an MCMC algorithm is include quantitative bounds on convergence 
to stationarity (e.g., [14, 15, 34, 35]), qualitative convergence rates such as 
geometric ergodicity (e.g., [21, 29, 32, 39, 40]), the existence of central limit 
theorems (e.g., [2, 3, 7, 10, 13, 21, 40]) and bounds on asymptotic variance 
of estimators (e.g., [7, 22, 41]). 

In this paper we introduce a new notion, variance bounding. Roughly, a 
Markov chain is variance bounding if the asymptotic variances for function- 
als with unit stationary variance are uniformly bounded (precise definitions 
are given below). We shall show that, for reversible chains at least, variance 
bounding is implied by geometric ergodicity, and conversely, if P is vari- 
ance bounding, then al+(l — a)P is geometrically ergodic for all < a < 1. 
More importantly, we shall prove that a reversible Markov chain is vari- 
ance bounding if and only if all I? functionals satisfy a usual central limit 
theorem, indicating that variance bounding is in some sense the "right" def- 
inition to use. We also prove that variance bounding is preserved under the 
Peskun partial ordering ([26, 40]) on Markov chains. Finally, applications to 
Metropolis-Hastings algorithms are presented. 
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2. Variance bounding. Given a Markov chain kernel P on a state space 
(X, J-) with unique stationary distribution vr(-), we let {X n } follow the kernel 
P in stationarity, so that ~P[X n £ A]= ir(A) for all A G T and n G N U {0}, 
and also P [X n G A \ Xq , . . . , X re _i] = P(X re _i,A) for all A G .F and all n G N. 
For a functional h:X — > R (assumed throughout to be measurable), the 
stationary variance is given by Var^(/i) = E[(/i(Xo) — E[/i(Xo)]) 2 ], and the 
asymptotic variance is given by 



If the Markov chain P is to be used to estimate the stationary expected 
value of h by I ££=i /i(X 4 ), then Var(/t, P) is a measure of the Monte Carlo 
uncertainty of the estimate. Thus, for MCMC algorithms, it is desirable to 
make Var(/i,P) as small as possible (cf. [7, 22, 23, 40, 41]). This prompts 
the following definition. 

Definition. P is variance bounding if there is K < oo such that 
Var(/i, P) < KV&r n (h) for all h: X — > R. Equivalently, P is variance bound- 
ing if sup{Var(/i, P); h:X—> R, Var^/i) = 1} < oo. 

Note that in the case where Var 7r (/i) = oo, the required inequality holds 
automatically for all K. 

Variance bounding is a natural property, in that it offers some control over 
the asymptotic variances Var(/i, P). We study its relation to more traditional 
MCMC properties below. For most of our results, we assume that P is 
reversible with respect to vr(-), that is, that 



It follows from [16] (see also [3]) that, for reversible chains and 1? functionals, 
the limit in equation (1) always exists, though it may be infinite. 

3. Relation to geometric ergodicity. Recall that a Markov chain kernel 
P with stationary distribution ir(-) is geometrically ergodic if there is p < 1 
and M: X — ► [0, oo] 7r-a.e. finite [i.e., such that n{x G X :M(x) < oo} = 1], 
such that \P n (x,A) - ir(A)\ < M(x)p n for all A G T, n G N, and x G X. 
Geometric ergodicity is an often studied property (e.g., [21, 29, 32, 39, 40]), 
which leads to many useful results, such as central limit theorems (see next 
section). 

However, geometric ergodicity is an overly strong notion in that it re- 
quires, among other things, that the Markov chain be aperiodic. Since esti- 
mates of functionals, and their variances Var(/i, P), are essentially unaffected 



(1) 




(2) f ir(dx)P(x,B)= [ ir(dx)P(x,A), A,BeF. 
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by periodicity considerations, it seems inappropriate to demand aperiodic- 
ity. And indeed, many Markov chains are variance bounding despite being 
periodic (e.g., the Markov chain Pi in Example 9 below). 

We now explore the relation between geometric ergodicity and variance 
bounding. We first show that, for reversible chains, variance bounding is 
strictly weaker than geometric ergodicity. (Proofs of all theorems are de- 
ferred until Section 7.) 

Theorem 1. If P is reversible and geometrically ergodic, then P is 
variance bounding. 

Next, we show that P is variance bounding if and only if any mixture 
of P with the identity is geometrically ergodic. We write / for the identity 
kernel, that is, the Markov chain which never moves, so that I(x, {x}) = 1 
for all x £ X. 

Theorem 2. If P is reversible, then the following are equivalent: 

(i) P is variance bounding. 

(ii) al + (1 — a)P is geometrically ergodic for all < a < 1. 

(iii) al + (1 — a)P is geometrically ergodic for some < a < 1. 

Corollary 3. If P is reversible, then for any fixed < a < 1, the fol- 
lowing are equivalent: 

(i) P is variance bounding. 

(ii) al + (1 — a)P is variance bounding. 

Section 6 below contains some applications of Theorems 1 and 2. We next 
note that if P has holding probabilities uniformly bounded away from 0, 
then variance bounding and geometrically ergodic are equivalent: 

Theorem 4. If P is reversible and inf xGi ^ P(x, {x}) > 0, then P is vari- 
ance bounding if and only if P is geometrically ergodic. 

As an application of Theorem 4, suppose P represents a random-walk 
Metropolis or systematic-scan Metropolis-within-Gibbs algorithm on R rf , with 
proposal increment densities positive in a neighborhood of 0, whose target 
density t is C 1 with ||Vlogt(x)|| > 5 > for all x G X. It then follows as 
in [33] that the rejection probabilities P(x, {x}) are uniformly bounded away 
from 0. Hence, by Theorem 4, variance bounding is equivalent to geometric 
ergodicity in this case. 

Similarly, the two notions are equivalent if the operator P is positive, 
that is, if E[/(A 7 'o)/(Xi)] > for all measurable / : X — > R when {X n } is in 
stationarity: 
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Theorem 5. If P is reversible and positive, then P is variance bounding 
if and only if P is geometrically ergodic. 

As an application of Theorem 5, suppose P represents a data augmen- 
tation algorithm, that is, the x-coordinate (only) of a two- variable Gibbs 
sampler. It follows from Lemmas 3-1 and 3-2 of [18] that P is reversible and 
positive. Hence, by Theorem 5, variance bounding is equivalent to geometric 
ergodicity in this case as well. (See also [11].) 

In particular, the slice sampler (e.g., [24, 25, 30]) can be viewed as the 
x-coordinate of a two-variable Gibbs sampler. (This holds for product slice 
samplers as well, since the multiple auxiliary variables are conditionally 
independent and can be regarded as a single auxiliary vector.) So, for any 
slice sampler, variance bounding is equivalent to geometric ergodicity. For 
example, it is known [30] that the slice sampler is geometrically ergodic 
whenever Q'(y)y 1+1 ^ a is nonincreasing near 0, for some a > 1, where Q(y) 
is the measure of the set where the target density value is at least y. It 
follows immediately that the slice sampler is also variance bounding under 
these conditions. 

In general, if P is variance bounding, then a slight modification of P 
is geometrically ergodic. Specifically, following [36], let P n be the binomial 
modification of P, corresponding to doing an (independently chosen) random 
number B n of steps from P, where B n ~ Binomial(2n, 1/2). Thus, P n = 
2~ 2n Ya=q ( 2 ™)-P j - Call P geometrically ergodic if, as usual, there is p < 1 
and vr-a.e. finite M:X ^ [0, oo] such that \P n (x,A) - n(A)\ < M(x)p n for 
all A G n £ N, and x € X. Then we have the following. 

Theorem 6. If P is reversible, then P is variance bounding if and only 
if P is geometrically ergodic. 

Remark. The stationary processes literature (e.g., [2, 12, 13]) defines 
many other mixing conditions, such as a-mixing, /3-mixing, p-mixing, (j)- 
mixing, etc. These conditions are related to usual Markov chain ergodic- 
ity conditions, for example, ^-mixing is equivalent to uniform ergodicity, 
exponentially-fast /3-mixing is equivalent to geometric ergodicity, a-mixing 
is implied by Harris ergodicity, etc. However, none of these mixing condi- 
tions is implied by variance bounding, since the mixing conditions all require 
ergodicity, whereas periodic (and therefore nonergodic) chains can still be 
variance bounding. 

4. Relation to central limit theorems. An important issue in MCMC is 
the existence of central limit theorems (e.g., [2, 3, 7, 10, 13, 40]). Where 
central limit theorems are known to hold, they underpin practical MCMC 
strategies for Monte Carlo error assessment (see, e.g., [8]). 
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Say that a functional h : X — > R with vr(|/i|) < oo [where vr(/) = J x f(x) x 
Tr(dx)] satisfies a usual central limit theorem (CLT) for a Markov chain P 
if, as n — > oo, the distribution of n~ l / 2 YJt=i[h{Xi) - ir(h)] converges weakly 
to iV(0,?;), where (with {X n } in stationarity) 

oo 

(3) v = Var(X ) + 2j2Cov(X ,Xi) < oo. 

(We say "usual" to distinguish this convergence from, e.g., convergence to 
other distributions, or other normalizations besides n _1//2 ; see also [3, 7, 27, 
37].) 

It is known ([12], Theorem 18.5.3; see also [3], [10]) that if P is geometri- 
cally ergodic, then h satisfies a usual CLT, provided 7r(|/i| 2+<5 ) < oo for some 
5 > 0. It was proven in [28], following [16], that if P is geometrically ergodic 
and reversible, then h satisfies a usual CLT whenever n(h 2 ) < oo. However, 
geometric ergodicity is an overly strong assumption; for example, periodic 
Markov chains can never be geometrically ergodic but they can still satisfy 
CLTs. 

The following theorem shows that, for reversible Markov chains, vari- 
ance bounding is the "right" definition for CLTs, that is, variance bounding 
(unlike geometric ergodicity) is the weakest property which still guarantees 
usual CLTs for all L 2 functionals. (We assume the stationary distribution 
for P is unique, to avoid degenerate cases where the state space breaks up 
into multiple closed subsets.) 

Theorem 7. If P is reversible, with unique stationary distribution vr(-), 
then P is variance bounding if and only if every h : X — > R with Tr(h 2 ) < oo 
satisfies a usual CLT for P. 

Remark. There are other results available (see, e.g., [3] and the refer- 
ences therein) which guarantee CLTs for specific functionals, rather than 
for all L 2 functionals. However, often MCMC is used to generate samples 
from 7r(-) before it is decided which functionals are of statistical interest. 
Thus, we find that it is most useful having results like Theorem 7 which 
apply to all L 2 functionals simultaneously. 

5. Relation to the Peskun ordering. The following partial order on Markov 
chain kernels was introduced by Peskun [26] for finite state spaces, and later 
by Tierney [40] for general state spaces. 

Definition. Let Pi and P2 be two Markov chain kernels on {X,T), 
both having invariant probability measure tt. Then Pi dominates P2 off the 
diagonal, written Pi >z P2, if Pi(x, A) > P2(x,A) for all x E X and A £ T 
with x ^ A. 
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It was proved by Peskun [26] for finite state spaces, and then by Tier- 
ney [41] (see also [22, 23]) for general state spaces, that if Pi y P 2 , and Pi and 
P2 are reversible with respect to the same vr(-), then Var(/i, Pi) < Var(/i, P2) 
for all h:X R. That is, Pi is "better" than P2, in the sense of being 
uniformly more efficient for estimating expectations of functionals. Thus, it 
seems reasonable that any Markov chain property designed to indicate good 
estimation should be preserved under the Peskun ordering. For the variance 
bounding property, that is indeed the case: 

Theorem 8. If P\ and P2 are both reversible with respect to vr(-), and 
Pi y Pi, and P2 is variance bounding, then P\ is variance bounding. 

On the other hand, the corresponding property for geometric ergodic- 
ity does not hold, indicating another advantage of variance bounding over 
geometric ergodicity: 

Example 9. Let X = Z with vr(m) = 2"l m l/ 3 - Define Pi by Pt(x,x- 
1) = 2/3 and P±(x,x+1) = 1/3 for x > 0, and Pi(x,x — 1) = 1/3 and P\(x,x + 
1) = 2/3 for x < 0, and Pi (0,-1) = Pi (0,1) = 1/2. Also, let P 2 be the 
Metropolis algorithm for tt(-) with proposal distribution Q(x,x + 1) = Q(x,x — 
1) = 1/2. [Thus, P 2 (x,x + 1) = P 2 (x,x) = 1/4 and P 2 (x,x - 1) = 1/2 for 
x > 0; P 2 (x,x — 1) = P 2 (x,x) = 1/4 and P 2 (x,x + 1) = 1/2 for x < 0; and 
P 2 (0, -1) = P 2 (0, 1) = 1/4 and P 2 (0,0) = 1/2.] Then both P 1 and P 2 are re- 
versible with respect to vr(-), and also Pi ^ P2. Furthermore, it follows as 
in Mengersen and Tweedie [19] that P2 is geometrically ergodic, and hence 
variance bounding by Theorem 1. On the other hand, Pi is periodic, and 
hence cannot be geometrically ergodic, even though Pi y P 2 . (Of course, Pi 
is still variance bounding, by Theorem 8.) 

6. Application to Metropolis Hastings algorithms. We now consider 
Metropolis-Hastings algorithms ([9, 20]). We define a slight generalization, 
as follows. Given a reference measure v{-) on X, with respect to which 
ir(dx) = t(x)i'(dx), and a nonnegative (measurable) function q : X x X — > R 
with J x q(x, y)v(dy) < 1 for all x £ X, the sub- Metropolis-Hastings algorithm 
is the algorithm with transition kernel 

M q (x,dy) = a(x,y)q(x,y)v(dy) +r(x)5 x {dy), 

where a(x,y) =min(l, l^f^ ), and r(x) = 1 - f x a(x,y)q(x,y)v(dy) > 0. 

By construction, this algorithm is reversible with respect to 7r(-). It may be 
described as follows. With probability J x q(x,y)u(dy), it performs the usual 
Metropolis-Hastings algorithm with proposal density q(x,y)/ J x q(x, y)v(dy) . 
Otherwise, with probability 1 — J x q(x,y)i'(dy), it stays at its current state. 
If fx q(x,y)u(dy) = 1, then M q is the usual Metropolis-Hastings algorithm. 
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By direct inspection, noting that a(x,y)q(x,y) = mhi(q(x,y),j^q(y,x)), 
we see the following: 

Proposition 10. For fixed u{-) and t, if q\{x,y) > q 2 (x,y) for all x,y £ 
X with x / y, then M qi y M q2 . (Hence, by Theorem 8, if M q2 is variance 
bounding, then so is M*.) 

Now, suppose M q2 is variance bounding, and that q\{x,y) > cq2{x,y) for 
all x y, for some c > 0. We can assume [by replacing c with min(c, 1) if 
necessary] that c < 1. Then M C(?2 = cMq 2 + (1 — c)/. Hence, by Corollary 3 
(with a = 1 — c), M C(?2 is also variance bounding. It then follows from Propo- 
sition 10 that M qi is also variance bounding. We conclude: 

COROLLARY 11. // q±(x,y) > cq2(x,y) for all x,y £ X with x ^ y, for 
some c > 0, and if M q2 is variance bounding, then M qi is variance bounding. 

Example 9 above shows that the analogous statement to Corollary 11 for 
geometric ergodicity does not hold. 

To continue, call a (measurable) function s : X — > [0, oo) MT-good if it is 
symmetric, positive and continuous, with exponentially bounded tails, and 
with f^^s^du = 1. Then a result of Mengersen and Tweedie [19] (see 
also [32] for higher-dimensional analogs) says that a random-walk Metropo- 
lis algorithm on X = R, with proposal density q(x,y) = s(y — x) for some 
MT-good s, is geometrically ergodic provided the target density has expo- 
nentially bounded tails. This is a very impressive result, but with the severe 
restriction that the proposal increments must correspond to a symmetric 
random walk. To improve this, we make the following definition. 

Definition. A proposal density function q: X x X is & uniformly 
minorized increment distribution {UMID) if there is c > and MT-good 
s : X — ► [0, oo) such that q(x, y) > cs(y — x) for all x, y G X. 

Combining Theorem 1 and Corollary 11 with the result of [19] immedi- 
ately gives the following: 

Corollary 12. Let t be a target density with exponentially bounded 
tails, and let q be a UMID proposal density function. Then M q is variance 
bounding. 

Note that in Corollary 12, we do not need to assume that s has ex- 
ponentially bounded tails, since if not then we can simply replace s(x) by 
min(s(x), e~' x '') without affecting the conclusion. Note also that Corollary 12 
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does not require the proposal density q to be symmetric, nor to correspond 
to a random walk. (Similar generalizations are also available for the multi- 
dimensional case, as in [32].) 

As one application of Corollary 12, consider a Langevin (MALA) algo- 
rithm (see [33] ) , with proposal density given by Q(x , • ) = N(x + ^ S V log t (x) , S 2 ) 
for some 5 > 0. Now, if the target density t is C 1 with tails that are precisely 
exponential, then Vlogi(x) is a bounded function of x £ X, and it follows 
easily that q is UMID. We conclude: 

Corollary 13. A Langevin algorithm for a C 1 target density on X = R 
with exponential tails is variance bounding. 

As a final application, we consider a Metropolis-Hastings algorithm for a 
density t supported on (0,oo), with proposal distribution given by Q(x, •) = 
N(x,x b ) for some fixed b > 0. That is, the variance of the proposal increment 
depends on the current state x £ X . (Related models were considered in [31].) 

If b > 2, then as x — > oo, the proposal values will be farther and farther 
out in the tails, so lirn^oo P(x, {x}) = 1. It follows as in [32], or by a simple 
capacitance argument (e.g., [17]), that the resulting Markov chain is neither 
geometrically ergodic nor variance bounding. So, we do not consider that 
case further here. (On the other hand, numerical simulations related to [31] 
indicate that if t is, e.g., a Cauchy distribution, then values b « 2.7 may give 
fastest numerical convergence, which is a separate but related issue.) 

If b = 2, then the distribution Q(x, •) equals the distribution of x + xZ, 
where Z~ iV(0, 1). Taking logarithms (cf. [31]) gives rise to an equivalent 
chain which is an ordinary random-walk Metropolis algorithm, with mod- 
ified target density t(y) = e y t(e y ), and with increment density f(u) equal 
to the density of log(l + Z) where Z ~ iV(0, 1). This increment distribu- 
tion is clearly UMID; indeed, we can simply let cs(u) = min(/(it), /(— u)). 
Hence, by Corollary 12, the transformed chain — and hence, also the origi- 
nal chain — is variance bounding, provided that t has exponentially bounded 
tails. 

Finally, suppose that < b < 2. Then Q(x,-) is the distribution of x + 
x b / 2 Z, where Z ~ -/V(0, 1). Instead of logarithms, consider the transformation 
X i — > X a , where a = 1 — 6/2 (so < a < 1). Then the proposal increment from 
x £ X transforms from x b / 2 Z to W = h(Z) = [x + x b / 2 Z] a - x a = x a (l + 
x- a Z) a - x a . Inverting this, Z = h~ 1 (W) = x a ((l + Wx- a ) l / a - 1). Now, 
the density of Z is g(z) = (2ir)~ l / 2 e~ z / 2 . Hence, for the transformed chain, 
the proposal increment W has density 

(dw/dz) v / 2vra(l + x^zY' 1 ' 
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We compute that, asx-> oo, this expression converges to (2-n)~ l / 2 a~ 1 e~^ w / a ^ 'I 2 , 
that is, to the density function of the iV(0,a 2 ) distribution. [Intuitively, this 
is because (-^x a ) 2 x b = a 2 x 2a ~ 2 x b = a 2 is constant, so the increment vari- 
ance of the transformed chain is approximately stabilized.] Hence, for large 
enough x, and thus for all x by positivity and continuity, the proposal density 
is UMID. Therefore, by Corollary 12, the transformed and original chains 
are variance bounding, provided that the transformed target density has 
exponentially bounded tails. 

7. Spectra and theorem proofs. We now proceed to the proofs of the the- 
orems. We begin by recalling some standard notation. Let P be a Markov 
chain kernel with stationary distribution ir(-) on a state space (X^F). For 
measurable f,g: X ^R, write (f,g) = f x f(x)g(x)ir(dx), and ||/|| = (/,/) 1/2 . 
Let Lq(it) = {/ : X — > R; n(f) = 0, vr(/ 2 ) < oo}, and regard P as an operator 
acting on Lq(tt), by (Pf)(x) = f x f(y)P(x,dy). Write o~(P) for the spec- 
trum of the operator P acting on Lq(-k) (see, e.g., [4, 36]). If P is a reversible 
Markov chain, then P is a self-adjoint operator with respect to (•, •), and also 
o-(P) C [-1, 1] (cf. [1, 7]). Theorem 2 of [28] says that if P is reversible, then 
P is geometrically ergodic if and only if there is r < 1 with cr(P) C [— r, r]. 

We have the following. 

Theorem 14. // P is reversible, then P is variance bounding if and 
only if sup(cr(P)) < 1. 

Proof. Suppose first that sup(cr(P)) = A < 1. Then by Proposition 1 
of [36], Var(/i, P) < 2(1- A)" 1 Var 7r (/i) for all h : X -> R. Hence, P is variance 
bounding with constant K = 2(1 — A)" 1 <oo. 

Conversely, suppose sup(cr(P)) = 1. Let £ be the spectral measure for P 
(see, e.g., [4, 7, 28, 36, 38]), and let r < 1. Then £((r, 1]) is nonzero, so there 
is h £ Lq(tt) in range of £ ((r, 1]). It follows similarly to Proposition 1 of [36] 
(cf. [7, 16]) that Var(/i,P) > y^Var^/i). Since this holds for any r < 1, it 
follows that sup feg £2( 7r )[Var(/i,P)/ Var 7r (/i)] > sup r<1 = oo. Hence, P is 
not variance bounding. □ 

Proof of Theorem 1. If P is reversible and geometrically ergodic, 
then there is r < 1 with o~(P) C [— r, r\. In particular, sup(cr(P)) < r < 1, so 
P is variance bounding by Theorem 14. □ 

Proof of Corollary 2. (i) =^ (ii): Suppose P is variance bounding, 
and < a < 1. Then by Theorem 14, sup(a(P)) < 1, that is, there is c < 1 
with o~(P) C [— l,c]. On the other hand, 

a(al + (1 - a)P) = {A G R s.t. (al + (1 - a)P - XI) is not invertible} 
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| A £ R s.t. (1 -a)(p- ^ — -l\ is not invertiblej 
|a £ R s.t. ^^ea(P)\ 



{(a + (l-a)y) s.t. y £ a(P)}, 



where the last equality follows by solving for A in the equation y = yrf • 
Hence, since <r(P) C [— l,c], it follows that 

a(al + (1 - a)P) C [a + (1 - o)(-l),a + (1 - a)c] 
= [2o-l,a + (l-a)c] C [-r,r], 

where r = max(|2a — l|,a + (1 — a)c) < 1. Hence, by Theorem 2 of [28], 
al + (1 — a)P is geometrically ergodic. 

(ii) ==> (hi): Immediate. 

(hi) ==>• (i): If a/+ (1 — a)P is geometrically ergodic, then there is r < 1 
with a (a I + (1 — a)P) C [— r, r]. But from the above, 

<7(o/+(l-o)P) = |A€Rs.t. ^— ^e<7(P)|, 

so it follows that a{P) C [^ff , f5f ]. In particular, sup(a(P)) < fEf < 1, so 
P is variance bounding. □ 



Proof of Theorem 3. We see from the proof of Theorem 2 that 

sup(cr(a/ + (1 - a)P)) = a + (1 - a) sup(<r(P)). 

It follows that for < a < 1, sup(o"(a/ + (1 — a)P)) < 1 if and only if 
sup (tr(P)) < 1. The result then follows from Theorem 14. □ 



Proof of Theorem 4. If P is reversible and geometrically ergodic, 
then P is variance bounding by Theorem 1. Conversely, suppose P is re- 
versible and variance bounding, with 5 = inf^g^ P(x, {x}) > 0. Let S(x, A) = 
(1 — 5)~ 1 (P(x,A) — 61 x& a)- Then S is another Markov chain kernel on 
X, and P = 51 + (1 - 5)S. It follows that inf a(P) > 6 + (1 - <5)(-l) = 
25 — 1 > —1. Also supcr(P) < 1 by Theorem 14. Hence, there is r < 1 with 
c(P) Q [— r , r ], so P is geometrically ergodic. □ 



Proof of Theorem 5. Note that E[f(X )f(X 1 )] = (f,Pf), so posi- 
tivity is equivalent to (/, Pf) > for all / £ ^ol 71 ")- This implies that A > 
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for all A £ cr{P). Hence, using Theorem 14, 

P is geometrically ergodic <^=^ sup{|A| : A £ cr(P)} < 1 

Sup{A:A£<r(P)}< 1 
<^=^ P is variance bounding. □ 

PROOF of Theorem 6. Note that we can write ~P n = + P)) n . 
Hence, the result follows immediately from Theorem 2 (with a = 1/2). □ 

PROOF of Theorem 7. If P is variance bounding, then A = sup(cr(P)) < 
1. Let £ be the spectral measure for P, and let £h be the induced measure 
defined by 

£ h (S) = [ h(x)(£(S)h)(x)ir(dx). 
Jx 

Then it follows (cf. [7]) that 

2 f 1 1 + X C fj^ / 1 + A 

It then follows from Kipnis and Varadhan [16] (see also [3]) that h satisfies 
a usual CLT for P. 

Conversely, if P is not variance bounding, then A = 1. It follows as in the 
proof of Theorem 14 that £((r, 1]) is nonzero for every r < 1. Since P has 
unique stationary distribution, 1 ^ c(P), so there must be infinitely many 
m £ N such that £ ((1 — 2~ m , 1 — 2~ m ~ 1 ]) is nonzero. Let m\ < m,2 <■■■ (so 
nii > i) with £ ((1 - 2~ m % 1 - 2~ m *~ 1 ]) bee nonzero. Let £ ^ol 71 ") be in 
the range of £((1 — 2 _m % 1 — 2~ mi_1 ]), with = 1. Then spectral the- 
ory implies that the {gi} are orthonormal, and furthermore, Cov((?j, Pgi) = 
(9i,Pgi) > 1 - 2"™ 1 . Finally, let h = 2-*/ 2 ^. Then by orthonormality, 

oo 

Var^/i) = |H| 2 = ^(2- 4 / 2 ) 2 = 1< oo. 
i=l 

On the other hand, with {X n } in stationarity, again using orthonormality, 

oo 

Cov(h(X ),h(X n )) = £ 2- i Cov( 5i , P n 9i ) 
i=i 

oo 

> ^2-'(l-2~ mi ) n 
i=i 

oo 

>5^2- i (l-2 _i ) n . 
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Hence 



oo oo 



oo 



^Cov(/ l (X ),/ l (X n ))>^2 



'53(1-2-*) 



n 




n=0 



oo 



'[1 



(1-2-)] 



-i 



i=l 



oo 



E(l) = °o- 



i=l 



i=l 



It follows that u in (3) is infinite, so /i does not satisfy a usual CLT for P. 



Proof of Theorem 8. Lemma 3 of Tierney [41] says that since Pi >z 
P2, therefore P2 — Pi is a positive operator. It follows that sup(cr(P2)) > 
sup(er(Pi)). Hence, using Theorem 14 twice, if P2 is variance bounding, 
then sup(o"(i-2)) < 1, so sup(cr(Pi)) < 1, so Pi is variance bounding. [Alter- 
natively, by Theorem 4 of [41], Vax(h, Pi) < Var(/i, P 2 ) < KVar^h).} □ 

Remark 1. The above theorems have all been proven for reversible 
chains only. However, it seems likely that analogs of some of them (e.g. 
Theorem 1) carry over in some form to nonreversible chains, about which 
various facts about convergence are known (see, e.g., [5, 6, 18, 23]). We leave 
this as an open problem for future work. 

8. Summary. This paper defined a Markov chain to be variance bound- 
ing if the asymptotic variances for functionals with unit stationary variance 
are uniformly bounded. For reversible chains, we proved that this property 
is weaker than geometric ergodicity, but equivalent to al + (1 — a)P being 
geometrically ergodic for all < a < 1. Furthermore, in contrast to geometric 
ergodicity, the variance bounding property: allows for periodicity; is equiv- 
alent to all L 2 functionals satisfying a usual central limit theorem; and is 
preserved under the Peskun [26] partial ordering on Markov chains. We also 
presented some applications to Metropolis-Hastings MCMC algorithms, and 
showed how variance bounding could apply more easily and more generally 
than geometric ergodicity. 

Overall, we view these results as indicating that as a property to use in 
the study of MCMC algorithms, variance bounding is similar to, but more 
convenient than, geometric ergodicity. We hope that the notion of variance 
bounding can be used to further understand Markov chains and MCMC 
algorithms in other contexts. 



□ 
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